Extended Performance Graphs for Cluster Retrieval
نویسندگان
چکیده
Performance evaluations in Probabilistic Information Retrieval are often presented as Precision-Recall or PrecisionScope graphs avoiding the otherwise dominating effect of the embedding irrelevant fraction. However, precision and recall values as such offer an incomplete overview of the information retrieval system under study: information about system parameters like generality (the embedding of the relevant fraction), random performance and the effect of varying the scope is badly missed. In this paper three cluster performance graphs are presented. In those cases where complete ground truth is available (both cluster size and database size) the Cluster Precision-Recall (Cluster PR) graph and the GeneralityPrecision=Recall graph are proposed. In those cases where cluster sizes are unknown (and so recall) the double logarithmic Cluster Precision Window graph is proposed. 1 Shortcomings of presently used retrieval performance measures Performance characterization of content-based image and audio retrieval often borrows from performance figures developed over the past 30 years for probabilistic text retrieval. Landmarks in the text retrieval field are the books [12] and [11] as well as the proceedings of the annual ACM SIGIR [7] and NIST TREC [14] conferences. In the area of probabilistic retrieval the results of performance measurements are often presented in the form of Precision-Recall (or Recall-Precision) graphs and PrecisionScope graphs. Each of these standard performance graphs provides the user with incomplete information about how the IR System will perform for various cluster sizes and various embedding sizes. Generality (influence of the relevant fraction) as a system parameter hardly seems to play a role in performance analysis. Although generality may be left out as a performance indicator when competing methods are tested under constant generality conditions, it appears to be neglected even in cases where generality is widely varying (a wide range of cluster sizes in one specific database is the most frequently encountered example). That generality for a cluster of relevant items in a large embedding database is often ≈ 0.0 does not mean that its exact low level no longer matters. A continually growing embedding around a constant size cluster of relevant items will eventually lower the overall precision-result curve (for the user) to unacceptable low levels as is shown in Figure 1.
منابع مشابه
Extended Performance Graphs for Cluster Retrieval
Performance evaluations in Probabilistic Information Retrieval are often presented as Precision-Recall or PrecisionScope graphs avoiding the otherwise dominating effect of the embedding irrelevant fraction. However precision and recall values as such offer an incomplete overview of the information retrieval system under study: information about system parameters like generality (the embedding o...
متن کاملSparse Clustering for Probability Un-Weighted Graphs Mining
-Probabilistic graphs have significant importance in data mining. The correlations endure amid the adjacent edges in different probabilistic graphs. Graph clustering is used in exploratory data analysis at data compression, information retrieval and image segmentation. The existing work presented a Partially Expected Edit Distance Reduction (PEEDR) and Correlated Probabilistic Graphs Spectral (...
متن کاملAn Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملar X iv : 0 80 4 . 35 99 v 1 [ cs . I R ] 2 2 A pr 2 00 8 Respect My Authority ! HITS Without Hyperlinks , Utilizing Cluster - Based Language Models
We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform re-ranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing entities. Links between entities are created via c...
متن کاملArtificial Intelligence and Query Execution Methods in the VizIR Framework
The article introduces the architecture of the querying components of the visual information retrieval framework VizIR. A major design goal was to assure adaptability and extensibility in manifold ways. VizIR components can be arbitrarily combined to build extensive applications. The framework provides various visual content descriptors, similarity measures and query models. Moreover, the platf...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001